840 research outputs found
Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions
Neural networks rely on convolutions to aggregate spatial information.
However, spatial convolutions are expensive in terms of model size and
computation, both of which grow quadratically with respect to kernel size. In
this paper, we present a parameter-free, FLOP-free "shift" operation as an
alternative to spatial convolutions. We fuse shifts and point-wise convolutions
to construct end-to-end trainable shift-based modules, with a hyperparameter
characterizing the tradeoff between accuracy and efficiency. To demonstrate the
operation's efficacy, we replace ResNet's 3x3 convolutions with shift-based
modules for improved CIFAR10 and CIFAR100 accuracy using 60% fewer parameters;
we additionally demonstrate the operation's resilience to parameter reduction
on ImageNet, outperforming ResNet family members. We finally show the shift
operation's applicability across domains, achieving strong performance with
fewer parameters on classification, face verification and style transfer.Comment: Source code will be released afterward
Distinctive action sketch for human action recognition
Recent developments in the field of computer vision have led to a renewed interest in sketch correlated research. There have emerged considerable solid evidence which revealed the significance of sketch. However, there have been few profound discussions on sketch based action analysis so far. In this paper, we propose an approach to discover the most distinctive sketches for action recognition. The action sketches should satisfy two characteristics: sketchability and objectiveness. Primitive sketches are prepared according to the structured forests based fast edge detection. Meanwhile, we take advantage of Faster R-CNN to detect the persons in parallel. On completion of the two stages, the process of distinctive action sketch mining is carried out. After that, we present four kinds of sketch pooling methods to get a uniform representation for action videos. The experimental results show that the proposed method achieves impressive performance against several compared methods on two public datasets.The work was supported in part by the National Science Foundation of China (61472103, 61772158, 61702136, and 61701273) and Australian Research Council (ARC) grant (DP150104645)
Dynamic Causal Disentanglement Model for Dialogue Emotion Detection
Emotion detection is a critical technology extensively employed in diverse
fields. While the incorporation of commonsense knowledge has proven beneficial
for existing emotion detection methods, dialogue-based emotion detection
encounters numerous difficulties and challenges due to human agency and the
variability of dialogue content.In dialogues, human emotions tend to accumulate
in bursts. However, they are often implicitly expressed. This implies that many
genuine emotions remain concealed within a plethora of unrelated words and
dialogues.In this paper, we propose a Dynamic Causal Disentanglement Model
based on hidden variable separation, which is founded on the separation of
hidden variables. This model effectively decomposes the content of dialogues
and investigates the temporal accumulation of emotions, thereby enabling more
precise emotion recognition. First, we introduce a novel Causal Directed
Acyclic Graph (DAG) to establish the correlation between hidden emotional
information and other observed elements. Subsequently, our approach utilizes
pre-extracted personal attributes and utterance topics as guiding factors for
the distribution of hidden variables, aiming to separate irrelevant ones.
Specifically, we propose a dynamic temporal disentanglement model to infer the
propagation of utterances and hidden variables, enabling the accumulation of
emotion-related information throughout the conversation. To guide this
disentanglement process, we leverage the ChatGPT-4.0 and LSTM networks to
extract utterance topics and personal attributes as observed
information.Finally, we test our approach on two popular datasets in dialogue
emotion detection and relevant experimental results verified the model's
superiority
- …